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This  report  is  one  in  a  series  concerned  with  the  possible 
applications  of  using  voice  recognition  technology  in  command 
and  control  tasks. 
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EFFECT  OF  OPERATOR  MENTAL 
LOADING  ON  VOICE  RECOGNITION  SYSTEM  PERFORMANCE 

A.  OBJECTIVE  AND  BACKGROUND 

The  objective  of  this  experiment  was  to  determine  if  operator 
mental  workload  affected  the  performance  of  a  voice  recognition 
system  comprised  of  a  human  operator  and  a  discrete  utterance 
voice  recognition  device.   Specifically,  the  question  addressed 
was:   Would  increased  operator  mental  workload  (with  respect  to 
that  experienced  during  training  of  the  recognition  device)  re- 
sult in  changes  in  his  speech  which  would  in  turn  result  in  de- 
graded performance  of  the  voice  recognition  system?  A  special 
vocabulary  was  used  to  ensure  a  baseline  error  rate  with  which 
to  compare  various  mental  loading  levels.   As  such,  it  was 
expected  that  absolute  error  rates  would  be  higher  than  those 
normally  realized  in  real  world  operations.   This  experiment  with 
mental  loading  has  an  integral  relationship  to  previous  motor 
loading  research  by  Armstrong  (1980) . 

B.  SUBJECTS 

Twenty-four  subjects  participated  on  a  volunteer  basis  with 
no  monetary  or  other  incentive.   Twenty-two  of  the  subjects  were 
students  at  the  Naval  Postgraduate  School  (NPS)  and  two  were 
military  staff  members  at  NPS.   They  included  22  male  military 
officers  representing  the  United  States  Navy,  Army,  Air  Force, 
Marine  Corps  and  Coast  Guard:   one  female  civilian  from  the 
United  States  National  Security  Agency;  and  one  male  military 


officer  of  the  Canadian  Forces.   All  subjects  were  between  the 
ages  of  27  and  43  inclusive  and  the  ranks  of  the  military  officers 
ranged  from  Lieutenant  to  Commander  and  from  Captain  to 
Lieutenant-Colonel  inclusive. 

Sixteen  of  the  subjects,  designated  "little  experience", 
were  subjects  in  a  previous  experiment  by  Poock  (1980)  and  had 
between  two  and  ten  hours  experience  on  the  voice  recognition 
system  used  in  the  experiment:  -  mean  6.2  hours;   eight,  designated 
"no  experience",  had  no  experience  on  this  equipment.   Only  two 
of  the  subjects  had  experience  -  one  half  hour  each  -  on  the 
Response  Analysis  Tester  which  was  used  to  simulate  operator 
mental  loading. 

C.   EQUIPMENT  USED 

1.   Response  Analysis  Tester  (RATER) 

The  General  Dynamics  Response  Analysis  Tester  (RATER,  Model  3) 
shown  in  figure  1  was  used  to  simulate  operator  mental  loading. 
Brady  (1968)  described  the  Rater  as  a  "psychomotor  testing  in- 
strument designed  to  provide  sensitive,  reliable  measurement  of 
any  impairment  of  response  speed/accuracy  and  short-term  memory 
for  patterned  or  color  stimuli."   Long  and  Fishburne  (1973) 
provide  normative  RATER  performance  data  for  a  student  naval 
aviator  population  and  reference  several  studies  in  which  the 
RATER  was  used.   Newsom,  Brady  and  O'Laughlin's  study  (1966) 
of  performance  in  a  revolving  space  station  simulator  found  that 
turning  the  head  while  in  a  rotating  environment  resulted  in 
degraded  short  term  memory  as  measured  on  the  RATER. 
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The  RATER  consisted  of  a  small  subject  console  which  con- 
tained a  display  window  and  four  response  buttons  in  a  two  by 
two  arrangement  and  a  larger  experimenter  console  which  contained 
the  controls  and  digital  counters.   These  counters  were  used  in 
the  derivation  of  subject  RATER  performance  data. 

The  RATER  was  used  to  generate  and  display  random  sequences 
of  four  individual  symbols  -  a  triangle,  a  circle,  a  cross  and  a 
diamond  -  in  the  window  of  the  subject  console.   Symbols  were 
presented  at  a  constant  rate  of  one  symbol  every  1.5  seconds. 
A  response  button  on  the  subject  console  was  associated  with  each 
of  the  four  symbols  and  labelled  accordingly. 

Three  different  RATER  "delay"  modes  were  used  -  delay  zero, 
delay  one  and  delay  two.   While  the  n    stimulus  of  the  sequence, 
St(n),  was  being  displayed  and  before  St(n+1)  replaced  it,  the 
subject  was  required  to  press  the  correct  response  button  in 
order  to  score  a  correct  response.   In  delay  zero  the  correct 
response  button  was  the  one  which  corresponded  to  the  symbol 
comprising  St(n).   In  delay  one  the  correct  response  button  for  the 
n  '  stimulus  was  the  one  which  corresponded  to  the  symbol  com- 
prising St(n-l);  in  delay  two  the  correct  response  button  for  the 
n    stimulus  was  the  one  which  corresponded  to  the  symbol  com- 
prising St(n-2).   In  other  words,  in  delay  zero  the  subject  re- 
sponded with  the  symbol  which  correlated  to  the  symbol  being  dis- 
played.  In  delay  one,  the  correct  response  was  the  symbol  which 
had  appeared  the  previous  trial.   In  delay  two,  the  correct  re- 
sponse was  the  stimulus  symbol  which  had  been  presented  two 
trials  earlier,  i.e.  the  subject  had  to  remember  two  back  instead 

of  one  back  (delay  one)  or  none  back  (delay  zero) . 
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The  RATER  was  used  solely  as  a  device  to  load  the  subjects 
mentally,  i.e.  to  load  the  subjects  through  tasking  which  was 
primarily  decision-making  in  nature.   The  choice  of  stimuli 
presentation  rate  and  delay  modes  was  based  on  experience  gained 
during  a  pilot  study,  the  findings  of  other  researchers, 
especially  Long  and  Fishburne  (1973) ,  and  the  expected  lack  of 
RATER  experience  of  the  subjects. 

2 .   Voice  Recognition  System  and  Choice  of  Vocabulary 
A  Threshold  Technology  Inc.  Model  T600  discrete  utterance 
voice  recognition  system  (which  will  hereafter  be  referred  to 
as  the  T600)  was  used  as  the  equipment  component  of  the  combined 
equipment  plus  human  operator  voice  recognition  system.   The 
vocabulary  used  in  this  experiment  consisted  of  50  different 
utterances.   Thirty  were  single  words  selected  by  the  experimenter 
from  the  Listener's  Answer  Sheets  of  the  Modified  Rhyme  Test, 
one  of  the  four  test  types  which  have  been  commonly  used  in 
measuring  intelligibility  in  speech  communication  (Kryter,  1972). 
Sixteen  of  these  30  words  were  eight  pairs  of  rhyming  words  which, 
within  each  pair,  differed  only  with  respect  to  initial  consonant  ■ 
for  example,  "beat"  and  "peat".   The  other  14  words  were  seven 
pairs  of  non-rhyming  but  similar  words  which,  within  each  pair, 
differed  only  with  respect  to  final  consonant  -  for  example, 
"sap"  and  "sat".   The  other  20  utterances  were  chosen  by  the 
experimenter  from  single  words  commonly  used  in  Command  and 
Control  environments;  they  were  chosen  to  be  more  easily  dis- 
tinguished from  each  other  and  from  the  other  30  words  of  the 
vocabulary. 


All  words  of  the  vocabulary  were  one  or  two  syllables  in 
length.   Short  words  were  deliberately  selected  to  facilitate 
generation  of  as  many  T600  word  recognition  attempts  as  possible 
in  the  limited  time  that  each  volunteer  subject  was  available. 
The  vocabulary  is  listed  by  word  type  in  Appendix  A.   A  listing 
in  the  order  in  which  the  words  were  trained  is  attached  to  the 
written  instructions  initially  given  to  subjects  and  is  contained 
in  Appendix  C. 

This  particular  vocabulary  was  chosen  to  increase  the 
likelihood  of  recognition  errors  by  the  T600  for  the  following 
reason.   (T600  recognition  errors  (RE's)  are  operationally  de- 
fined in  the  Dependent  Variables  section.)   Recognition  accuracy 
with  older  Threshold  Technology  Inc.  voice  recognition  equipment 
similar  to  the  T600  and  using  more  normal  vocabularies  (i.e. 
comprised  entirely  of  more  easily  distinguished  words)  has  often 
been  better  than  99%,  as  for  example,  in  the  studies  by  Martin 
and  Grunza  (1974),  Scott  (1975)  and  Scott  (1978).   This  level 
of  accuracy  would  produce  an  average  of  about  one  (or  less)  RE's 
per  100  spoken  utterances.   It  was  anticipated  that  if  operator 
mental  loading  did  affect  recognition  accuracy  then  the  effect 
would  be  relatively  small  and,  due  to  the  discrete  nature  of  RE's, 
would  probably  not  be  easily  distinguishable  if  only  one  RE  per 
100  utterances  were  being  observed  -  for  example,  a  20%  increase 
in  RE's  would  probably  not  be  great  enough  to  produce  a  sufficient 
number  of  increased  RE  observations  to  be  statistically  distin- 
guishable from  inherent  random  variation.   However,  if  a  vocabulary 


could  be  chosen  to  produce  approximately  ten  RE's  per  100 
utterances  a  20%  increase  in  RE's  should  be  more  easily  dis- 
tinguishable as  this  would  result  in  an  average  observation  of 
12  RE's  per  hundred  utterances. 

An  alternative  method  of  detecting  a  small  expected  change 
in  recognition  accuracy  would  be  to  increase  the  number  of  ut- 
terances spoken  by  the  subjects.   This  was  not  considered 
feasible  here  because  of  the  greatly  increased  time  which  would 
be  required  of  each  of  the  volunteer  subjects;  the  experimental 
design  used  required  between  1.5  and  two  hours  per  subject.   For 
this  reason  the  former  method,  special  vocabulary,  was  used. 

3.   Arrangement  of  Equipment  Used 

Figure  2  illustrates  the  functional  relationships  among  the 
various  experimental  devices  used  in  the  experiment.   A  photograph 
of  the  experimenter  control  station  is  shown  in  figure  3.   The 
subjects  were  seated  one  at  a  time  in  an  Industrial  Acoustics 
Co.  Inc.  Controlled  Acoustic  Environments  booth.   The  subject 
console  of  the  RATER  was  on  a  table  in  front  of  the  subject. 

A  Maico  Model  MA-24B  Dual  Channel  Research  and  Diagnostic 
Audiometer  and  headsets  were  used  to  provide  oral  communication 
between  the  subject  and  the  experimenter.   The  experimenter  could 
speak  to  the  subject  by  depressing  a  "talk-over"  switch.   Another 
microphone,  placed  in  the  booth,  was  live  at  all  times  and  per- 
mitted the  experimenter  to  hear  what  was  happening  in  the  booth  - 
in  particular,  what  the  subject  said.   A  Sony  model  TC  124 
cassette  tape  recorder  was  connected  to  permit  simultaneous  re- 
cording of  the  signals  detected  by  the  booth  microphone  and  the 
signals  that  the  subject  received  over  his  headset. 
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The  special  T600  system  noise-cancelling  microphone  was 
mounted  on  the  subject's  headset  and  connected  only  to  the  T600. 
The  microphone  ON/OFF  switch  was  located  outside  of  the  booth. 

A  Computer  Devices  Inc.  Model  1203  Miniterm  portable  terminal 
was  connected  to  the  T600  system  in  such  a  manner  that  when  the 
T600  recognized  an  utterance  the  output  string  for  that  utter- 
ance was  typed  at  the  terminal.   The  T600  was  programmed  so  that 
the  ASCII  output  stream  associated  with  each  utterance  of  the 
vocabulary  was  simply  the  letters  spelling  the  utterance  followed 
by  a  carriage  return  and  a  line  feed;  thus,  for  example,  if  in 
the  recognition  mode  the  T600  "thought"  that  a  subject  said 
"attack",  the  word  "attack"  was  displayed  on  the  CRT  on  a  separate 
line  and  printed  at  the  terminal,  also  on  a  separate  line.   This 
provided  the  experimenter  with  a  paper  printout  of  T600  recognition 
activity  which,  with  the  correct  utterances  recorded  on  the  cas- 
sette tape  recorder,  permitted  thorough  analysis  of  the  data. 
Accurate,  manual,  real-time  analysis  by  the  experimenter  using 
only  the  T600  CRT  was  infeasible  primarily  because  of  the  rate 
at  which  the  T600  was  required  to  process  signals  for  recognition  - 
one  word  every  three  seconds. 

An  Akai  model  4000DS  Mk  II  reel-to-reel  tape  recorder  was 
connected  to  the  Maico  Audiometer  and  used  to  present  stimuli 
to  the  subject. 

D.   EXPERIMENTAL  PROCEDURE 

Subjects  were  tested  one  at  a  time  during  normal  working 
hours.   They  were  first  required  to  complete  the  Subject  Data 
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Form  (Appendix  B)  and  then  read  three  pages  of  written  instructions 
(Appendix  C)  which  briefly  introduced  the  experiment  and  provided 
general  guidelines  on  inputting  voice  data  to  the  T600.   Remaining 
instructions  to  the  subject  were  given  orally  by  the  experimenter. 

"No  experience"  subjects  only  were  next  given  a  brief  demon- 
stration of  the  operation  of  the  T600.   For  this  stage  the  T600 
microphone  and  the  headset  on  which  it  was  mounted  were  removed 
from  the  booth  and  the  microphone  was  reconnected  outside  of  the 
booth  so  that  the  subject  could  immediately  see  what  happened 
when  speech  signals  were  input  to  the  T600.   The  importance  of 
the  guidelines  which  the  subject  had  just  read  were  demonstrated 
during  this  stage  and  the  subject  was  allowed  to  familiarize 
himself  with  the  T600  for  about  five  minutes. 

The  T600  microphone  and  the  headset  on  which  it  was  mounted 
were  then  reconnected  inside  the  booth.   (The  procedure  from 
this  point  on  pertains  to  all  subjects.)   The  50  word  vocabulary 
was  then  trained  one  word  at  a  time.   The  experimenter  had  all 
of  the  T600  controls  outside  of  the  booth  and  closely  controlled 
the  training  process,  requiring  the  subject  to  retrain  words  as 
necessary  -  for  example,  if  a  word  was  initially  trained 
monotonously.   The  T600  was  next  put  in  the  recognition  mode  and 
recognition  of  each  word  of  the  vocabulary  was  checked.   Words 
which  initially  could  not  be  recognized  were  retrained  until 
they  could  be  correctly  recognized.   If  a  word  was  correctly 
recognized  immediately  it  was  not  checked  further.   Words  not 
correctly  recognized  immediately  were  retrained  if  more  than  one 
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recognition  error  was  obtained  in  three  attempted  recognitions 
of  the  word.   Retrained  words  were  rechecked  and  retrained  again 
as  necessary. 

The  subject  next  received,  via  his  headset,  a  2.5  minute 
tape  recording  of  the  50  words  of  the  vocabulary  arranged  in 
random  order  and  presented  at  a  constant  rate  of  one  word  every 
three  seconds.   The  subject  was  instructed  to  repeat  the  words 
one  at  a  time  for  recognition  by  the  T600.   He  was  advised  to 
try  to  repeat  each  word  and  to  guess  with  a  word  in  the  vocabulary 
if  he  was  uncertain. 

Next  the  subject  was  briefed  on  the  three  RATER  tasks  that 
he  would  be  performing  -  delay  zero,  delay  one  and  delay  two. 
He  was  advised  that  his  RATER  scoring  would  be  number  of  correct 
responses  minus  number  of  incorrect  responses,  which  included  both 
omission  and  commission  errors.   The  subject  was  also  advised 
that  he  was  not  required  to  attain  any  particular  proficiency 
levels  on  the  RATER  but  that  it  was  sufficient  that  he  understood 
each  of  the  tasks  and  did  his  best.   He  was  then  allowed  to 
practice  the  three  RATER  tasks  for  up  to  20  minutes.   The  RATER 
was  used  in  the  self-pace  mode  during  parts  of  the  practice  if 
requested  by  the  subject.   In  the  self-pace  mode  the  symbol  dis- 
played was  replaced  by  the  next  symbol  in  the  sequence  only  when 
a  correct  response  was  made. 

When  the  subject  advised  the  experimenter  that  he  no  longer 
wished  to  practice  on  the  RATER  the  subject  was  given  a  combined 
2.5  minute  RATER  delay  one  and  word  repetition  for  recognition 
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practice.   The  subject  was  played  the  same  2.5  minute  tape  re- 
cording that  he  had  heard  earlier  and  was  instructed  as  before 
to  repeat  the  words  one  at  a  time  for  recognition  by  the  T600. 
He  was  advised  that  this  was  the  higher  priority  task  but  that 
he  was  to  simultaneously  perform  the  RATER  task  was  well  as  he 
could  with  whatever  capabilities  he  had  remaining  after  attending 
to  the  priority  task.   The  subject  was  also  reminded  to  be  sure 
to  repeat  each  of  the  taped  words  and  to  guess  with  a  word  in 
the  vocabulary  if  he  was  uncertain. 

The  subject  was  then  exposed  to  the  four  experimental  con- 
ditions corresponding  to  the  four  operator  mental  loading  con- 
ditions -  no  RATER  task  (NRT) ,  RATER  delay  zero  (RDO) ,  RATER 
delay  one  (RD1)  and  RATER  delay  two  (RD2) .   These  were  designed 
to  create  different  levels  of  operator  mental  loading.   Each  con- 
dition lasted  five  minutes  and  each  of  the  24  subjects  received 
the  four  conditions  in  a  different  order. 

During  condition  NRT  the  subject  was  required  only  to  repeat 
two  different  consecutive  random  orderings  of  the  words  of  the 
vocabulary;  these  were  presented  to  him  over  his  headset  as  during 
practice.   The  first  time  through  the  vocabulary  in  any  condition 
was  referred  to  as  the  first  half  of  the  trial;  the  second  time 
was  referred  to  as  the  second  half  of  the  trial.   The  first  word 
of  the  second  half  followed  the  last  word  of  the  first  half  with 
the  same  spacing  used  within  the  two  halves;  the  subject  received 
no  cues  that  he  was  halfway  through  the  trial.   In  each  of  the 
conditions  RDO,  RD1  and  RD2  the  subject  was  similarly  required  to 
repeat  random  orderings  of  the  vocabulary  (two  different  orderings 
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for  each  condition  as  in  condition  NRT) ;  however,  he  was  also 
required  to  perform  simultaneously  the  appropriate  RATER  task. 
He  was  reminded  that  the  repetition  of  words  for  recognition 
by  the  T600  was  the  higher  priority  task  and  to  guess  with  a 
word  from  the  vocabulary  if  he  was  uncertain,  as  during  the 
combined  practice.   (The  purpose  of  this  instruction  was  to  en- 
sure that  the  T6  00  received  the  same,  or  at  least  nearly  the 
same,  utterances  for  recognition  during  each  trial  half  and 
thus  provide  a  common  basis  for  comparison  of  T600  recognition 
errors.)   By  monitoring  the  T600  CRT  display  and  RATER  counters, 
listening  to  booth  activity  via  the  booth  microphone,  and  post- 
experiment  questioning  of  subjects,  the  experimenter  ensured 
that  subjects  adhered  to  the  instructions  that  they  had  been 
given. 

Immediately  after  a  subject  completed  each  condition,  and 
before  he  was  allowed  to  leave  the  booth,  he  was  instructed  to 
complete  the  "Feeling  Tone  Checklist"  shown  in  Appendix  D  in 
accordance  with  the  instructions  also  shown  in  Appendix  D.   This 
checklist,  developed  by  Pearson  and  Byars  (1956),  was  administered 
to  assess  possible  differential  subjective  fatigue  after  each 
of  the  four  different  mental  loading  conditions. 

During  the  experimental  conditions  subjects  were  not  given 
feedback  on  their  RATER  performance.   During  the  practice  sessions 
the  only  feedback  given  to  subjects  regarding  T600  recognition 
of  their  speech  was  the  knowledge  of  which  words  required  re- 
training;  no  feedback  regarding  T600  recognition  performance 
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was  given  to  subjects  during  the  experimental  conditions.   Those 
subjects  who  indicated  interest  on  their  "Subject  Data  Sheets" 
were  individually  briefed  immediately  after  they  completed 
the  last  experimental  condition  concerning  their  RATER  per- 
formance, T600  recognition  of  their  speech  and  the  hypotheses 
being  tested. 

Subjects  were  allowed  to  take  short  rest  breaks  as  they 
wished  during  the  training  and  practice  sessions  and  before  each 
of  the  four  experimental  conditions.   A  drinking  fountain  was 
located  nearby  for  any  subjects  who  became  thirsty  or  whose 
throats  became  dry. 

E.   DEPENDENT  VARIABLES 

The  following  were  calculated  for  each  half  of  each  trial: 

1.  T600  recognition  errors  (RE's) 

2.  Subject  verbal  errors. 

In  this  experiment  a  T600  recognition  error  was  operationally 
defined  to  be  a  failure  of  the  T600  to  recognize  correctly  any 
vocabulary  word  which  a  subject  said;  this  included  both  incor- 
rect recognition  (for  example,  the  subject  said  "beat"  and  the 
T600  "thought"  he  said  "peat")  and  rejection  (for  example,  the 
subject  said  "dip"  and  the  T600  failed  to  recognize  it  and 
emitted  a  "beep"  sound) .   This  definition  is  different  from 
most  definitions  of  recognition  error  in  the  voice  recognition 
literature  which  do  not  include  rejections  -  for  example, 
Martin  and  Grunza  (1974) .   The  operational  definition  used  in 
this  experiment  was  considered  more  consistent  with  the  aim  of 
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this  research  -  i.e.  to  answer  the  question:   Would  increased 
operator  mental  workload  (with  respect  to  that  experienced  during 
training  of  the  recognition  device)  result  in  changes  in  his 
speech  which  would  in  turn  result  in  degraded  performance  of 
the  voice  recognition  system?   It  was  believed  that  if  the  T600 
rejected  "dip"  when  said  by  a  subject  under  condition  RD2 ,  but 
not  when  said  by  the  same  subject  under  condition  NRT,  this 
suggested  changes  in  system  performance  as  a  result  of  changes 
in  the  subject's  speech  and  accordingly  should  be  recorded  and 
analyzed. 

A  subject  verbal  error  was  defined  as  a  failure  of  the 
subject  to  repeat  correctly  the  presented  word.   This  failure 
could  be  either  a  failure  to  respond  (omission)  or  responding 
with  a  non-vocabulary  word  or  the  wrong  vocabulary  word 
(commission) . 

F.   HYPOTHESES 

The  following  hypotheses  were  to  be  tested. 

1.   Hypotheses  Regarding  T600  Performance 

a.   H  :   The  different  levels  of  operator  mental  loading 

would  not  have  different  effects  on  T600 

recognition  error  rate. 

H..  :   H   false. 
1     o 

It  was  expected  that  increased  operator  loading 
would  result  in  increased  recognition  error 
rate  (RER) ,  i.e.  RER(NRT)  <  RER(RDO)  <  RER(RDl) 
<  RER(RD2) 
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b.  H  :   The  two  trial  halves  would  not  have  different 

effects  on  T600  recognition  error  rate. 

Hn :   H   false. 
1    o 

c.  H  :   "Little  experience"  subjects  would  generate 

the  same  T600  recognition  error  rate  as  "no 

experience"  subjects. 

Hn ;   H   false. 
1    o 

It  was  expected  that  "little  experience" 

subjects  would  generate  a  lower  recognition 

error  rate  than  "no  experience"  subjects. 

2 .   Hypotheses  Regarding  Subject  Performance 

a.   H  :   The  different  levels  of  operator  mental  loading 
o 

would  not  have  different  effects  on  subject 

verbal  error  rate. 

H,  :   H   false. 
1    o 

It  was  expected  that  increased  operator  loading 
would  result  in  increased  subject  verbal 
error  rate  (VER) ,  i.e.  VER(NRT)  <  VER(RDO) 
<  VER(RDl)  <  VER(RD2)   (This  hypothesis  was 
suggested  by  the  research  of  Johnston  (1975) 
who  observed  a  significant  detrimental  effect 
of  a  simultaneous  compensatory  tracking  task 
on  speech  intelligibility  in  noise.) 
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b.  H  :   The  two  trial  halves  would  not  have  different 

o 

effects  on  subject  verbal  error  rate. 

Hn :   H   false. 
1    o 

c.  H  :   The  different  RATER  delay  modes  used  would  not 

o 

have  different  effects  on  subject  RATER  per- 
formance (score) . 

H, :   H   false. 
1    o 

It  was  expected  that  subjects'  RATER  scores 
would  decrease  with  increasing  delay  mode. 

d.  H  :   Subject  subjective  fatigue  (as  measured  by 

the  "Feeling  Tone  Checklist"  of  Pearson  and 

Byars,  19  56)  would  be  the  same  for  the  four 

operator  mental  loading  conditions. 

H,  :   H   false. 
1     o 

It  was  expected  that  increased  operator  loading 
would  result  in  increased  subjective  fatigue 
(SF)  ,  i.e.  SF(NRT)  <  SF(RDO)  <  SF(RDl)  <  SF(RD2! 
Subject  T600  experience  was  not  expected  to  affect  subject 
verbal  error  rate  or  RATER  performance  and  hypotheses  regarding 
this  were  not  devised.   RATER  performance  was  not  recorded  at 
the  end  of  the  first  half  of  trials  and  hypotheses  regarding 
RATER  performance  versus  trial  half  were  not  devised. 

G.   EXPERIMENTAL  DESIGN 

A  conceptual  design  for  the  experiment  is  shown  in  Figure  4. 
This  is  a  three  factor  nested-factorial  design.   Each  subject 
is  nested  within  only  one  of  the  T600  experience  level  groups. 
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Each  of  the  24  subjects  was  exposed  to  the  four  experimental 
conditions  in  a  different  order.   Each  condition  was  presented 
an  equal  number  of  times  in  each  of  the  four  order  positions  - 
first,  second,  third,  and  fourth  -  within  both  the  "little 
experience"  and  "no  experience"  groups.   Subject  to  these  re- 
strictions the  order  of  presentation  of  the  four  conditions  to 
any  particular  subject  was  assigned  randomly. 

Subject  verbal  error  rate  and  T600  recognition  error  rate 
data  were  expected  to  be  inherently  Binomial  in  nature.   In  the 
case  of  subject  verbal  errors,  the  values  of  p,  the  probabilities 
of  a  subject  verbal  error,  or  equivalently ,  subject  verbal  error 
rates,  were  expected  to  be  small.   Because  of  this  and  because 
the  values  of  n,  number  of  words  to  be  spoken,  were  relatively 
large,  it  was  concluded  that  the  distributions  of  subject  verbal 
errors  could  be  approximated  by  Poisson  distributions  and 
statistical  methods  based  on  the  Poisson  distribution  were  se- 
lected to  test  subject  verbal  error  rate  hypotheses. 

In  the  case  of  T600  recognition  error  rates,  the  values  of  p, 
probabilities  of  a  recognition  error  or  recognition  error  rates, 
were  expected  to  be  too  large  to  permit  analyses  based  on  the 
Poisson  distribution.   It  was  decided  that  a  parametric  analysis 
of  variance  would  be  used  to  test  recognition  error  rate 

hypotheses;  prior  to  this  analysis  the  data  would  be  transformed 

1/2 
using  the  arcsin  transformation,  y1  =  2arcsin  (y    ) ,  to  remove 

the  relationship  between  the  variance  and  mean  expected  because 

of  the  binomial  nature  of  the  data. 


20 


Non-parametric  tests  were  selected  for  testing  hypotheses 
regarding  RATER  scores  and  subjective  fatigue  because  these  data 
were  not  expected  to  meet  the  assumptions  of  parametric  tests. 

Because  of  the  exploratory  nature  of  this  research,  a  level 
of  significance,  a,  of  .10  was  elected  during  the  design  phase. 
This  value  was  used  in  all  tests  of  hypotheses. 

H.   RESULTS 

1.   Results  for  T6Q0  Performance 

Appendices  E,  F,  G,  H  and  I  present  separate  confusion 
matrices  for  each  of  the  four  operator  mental  loading  -  experimental 
conditions  (NRT,  RDO,  RD1  and  RD2 )  and  for  all  four  conditions 
combined  respectively.   A  matrix  element  a . .  of  these  matrices 
indicates  the  proportion  of  the  time  that  the  T600  "thought" 
that  a  subject  said  word  j  when  the  subject  actually  said  word  i. 
Mean  T6  00  recognition  error  rates  for  each  operator  mental 
loading  condition,  trial  half,  subject  T600  experience  level  and 
vocabulary  word  type,  expressed  in  recognition  errors  per  100 
spoken  utterances,  are  shown  in  Table  I.   Results  for  the  oper- 
ational words  show  an  error  rate  of  2.91%  which  is  similar  to 
the  results  of  Poock  (1980)  and  Armstrong  (1980). 

Figure  5  is  a  plot  of  the  recognition  error  rate  observations 
and  Figure  6  a  plot  of  the  arcsin  transformed  recognition  error 
rate  observations.  Figure  6  shows  that  the  parametric  analysis 
of  variance  homogeneity  of  variance  assumption  was  adequately 
met.  Since  the  parametric  analysis  of  variance  is  quite  robust 
regarding  its  Normality  assumption  (Scheffe,  1959),  it  was  felt 
that  this  assumption  also  was  adequately  met  and  a  parametric 
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TABLE  I 
MEAN  T600  RECOGNITION  ERROR  RATES* 

BY  OPERATOR  MENTAL  LOADING  -  EXPERIMENTAL  CONDITION 
NRT  10.77% 

RDO  13.18% 

RD1  13.14% 

RD2  13.60% 

BY  TRIAL  HALF 

First  half      11.73% 
Second  half     13.61% 

BY  SUBJECT  T6  00  EXPERIENCE  LEVEL 

"Little  experience"         12.26% 
"No  experience"  13.50% 

BY  VOCABULARY  WORD  TYPE 

Rhyming  25.17% 

Non-rhyming  but  similar        12.33% 
Operational  2.91% 


OVERALL       12.67% 


*  Expressed  in  recognition  errors  per  100  spoken  utterances. 
A  recognition  error  was  operationally  defined  in  this  research 
to  be  a  failure  of  the  T600  to  recognize  correctly  any  vocabulary 
word  which  S  spoke  and  includes  both  incorrect  recognition  and 
rejection  of  vocabulary  words;   recognition  errors  do  not  include 
those  cases  where  S  spoke  a  word  not  in  the  vocabulary  (or  coughed, 
sighed,  etc.)  and  the  T600  generated  a  recognition. 
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analysis  of  variance  (Winer,  1962)  was  performed  on  the  arcsin 
transformed  data.   The  results  are  summarized  in  Table  II.   The 
model  for  this  analysis  was: 


Y.  ...    =u  +  L.  +  H.  +  E,  +S  ...  +  LH.  .  +  LE . .  +  HE.,  + 
ljkm        1     j  k    m(k)      i]      ik      jk 

LHE.  .,  +  e.  .  ,,  x 
13k     ijm(k) 

where   Y.  . ,   =  arcsin  transformed  recognition  error  rate 

for  operator  mental  loading  condition  i , 
trial  half  j,  T600  experience  level  k,  and 
subject  m;  the  range  of   Y.  .,    is   0  to   it. 

u  =  common  experimental  contribution  to   Y. .. 

r  1  jkm 

L.  =  contribution  of  operator  mental  loading 

condition  i,  i  =  1,2,3,4  (NRT,  RDO,  RD1,  RD2 ) 

H.  =  contribution  of  trial  half  j,  j  =  1,2  (first 
half,  second  half) 

E,  =  contribution  of  T600  experience  level  k, 

k  =  1,2  ("Little  experience",  "No  experience") 

S  ,,  .  =  contribution  of  subject  m  within  T600  exper- 
m(k) 

ience  level  k 

m  =  1,2,  ...,  16  for  k  =  1 

m  =  1,2,  . . . ,  8   for  k  =  2 

e.  .  „  »  =  random  error 
13m  (k) 

Subject  effects  were  considered  to  be  random;  all  others  were 

considered  to  be  fixed. 

The  analysis  showed  mental  loading  to  be  significant 

(F  =  4.88,  df  =  3/66,  p  <  .005).   A  parametric  Range  Test 
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TABLE  II 
ANALYSIS  OF  VARIANCE  FOR  T6  0-0  RECOGNITION  ERROR  RATE 


Source 


df 


MS 


Between  Subjects  23 

E  (T600  experience)  1 

Sub j .  w.  groups  22 

Within  Subjects  168 

L  (Operator  mental 

loading  condition)  3 

E  x  L  3 

L  x  sub j .  w.  groups  66 


0712 
0715 


0746 
0114 
0153 


4.88* 


H  (trial  half)  1 

E  x  H  1 

H  x  sub j .  w.  groups      22 


1819 
0001 
0136 


13.38* 


L  x  H  3 

E  x  L  x  H  3 

L  x  H  x  sub j .  w.  groups   66 


0058 
0150 
0201 


p  <  .005 
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(Hicks,  19  73)  was  performed  to  determine  which  operator  mental 
loading  conditions  were  statistically  different  (with  respect 
to  T600  recognition  error  rates)  and  it  was  found  that  the  only 
significant  differences  (a   =    .10)  were  those  between  condition 
NRT  and  each  of  the  other  three  conditions,  RDO,  RD1  and  RD2. 
The  analysis  also  showed  recognition  error  rate  to  be  higher 
in  the  second  half  of  trials  than  in  the  first  half  (F  =  13.38, 
df  =  1/22,  p  <  .005).   Subject  T600  experience  level  was  not 
significant  (F  <  1) .   No  interactions  were  significant  (all 
F's  <  1).   Figure  7  shows  recognition  error  rate  versus 
operator  mental  loading  condition  for  each  trial  half. 

Subjects  were  instructed  to  repeat  each  vocabulary  word 
heard  and  to  guess  with  a  word  in  the  vocabulary  if  uncertain 
of  the  word.   The  purpose  of  this  instruction  was  to  ensure 
that  the  T600  received  the  same,  or  at  least  nearly  the  same, 
utterances  for  recognition  during  each  trial  half,  i.e.  each 
vocabulary  word  once,  and  thus  provide  a  common  basis  for 
comparison  of  T600  recognition  errors.   Despite  the  instruction 
a  total  of  53  instances  arose  where  subjects  either  did  not 
speak  any  word  or  spoke  a  word  not  in  the  vocabulary;  these 
are  tabulated  in  Appendix  J.   T600  recognition  errors,  as 
operationally  defined  in  this  research,  could  not  occur  in 
these  instances  and  the  following  adjustment  was  made  to 
establish  a  reasonably  common  basis  for  comparison.   If  x  T600 
recognition  errors  occurred  in  a  particular  trial  half  for  a 
subject  and  that  subject  made  y  errors  of  this  type  in  the  trial 
half,  then  the  error  rate  observation  on  which  the  analysis 
was  based  was  x/(50-y),  not  x/50. 
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FIGURE  7 .  MEAN  T600  RECOGNITION  ERROR  RATES 

(in  recognition  errors  per  100  spoken  utterances) 
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2 .   Results  for  Subject  Performance 

Appendix  J  shows  total  subject  verbal  errors  for  each  subject 
for  each  half  of  each  trial  under  each  operator  mental  loading 
condition.   Mean  subject  verbal  error  rates  for  each  mental 
loading  condition,  trial  half,  subject  T600  experience  level  and 
vocabulary  word  type,  expressed  in  subject  verbal  errors  per 
100  words  presented  to  the  subject  for  repetition  (i.e.  each 
word  of  the  50  word  vocabulary  twice),  are  shown  in  Table  III. 

Tests  based  on  the  Poisson  distribution  (Cox  and  Lewis, 
1966)  were  performed  on  the  subject  verbal  error  rate  data. 
It  was  concluded  that  the  operator  mental  loading  condition  ef- 
fect was  significant  (p  <  .01,  a   =  .10)  and  that  the  trial  half 
effect  was  not  significant  (p  >  .8,  two-tailed  test,  a  =  .10). 

Subject  RATER  scores  are  shown  in  Appendix  K:   A  non- 
parametric  Friedman  two-way  analysis  of  variance  (Siegel,  1956) 
was  performed  on  the  RATER  scores  and  it  was  concluded  that 
scores  varied  by  delay  mode  (x^  =  42.75,  df  =  2,  p  <  .0005, 
a  =  .10).   A  non-parametric  test  proposed  by  Nemenyi  (in  Kirk, 
1968,  p.  497)  was  performed  to  determine  which  pairwise  comparisons 
of  RATER  scores  were  significant;  it  was  found  that  all  pairwise 
differences  were  significant  (p  <  .05)  with  RATER  performance 
declining  as  the  delay  mode  increased  from   0   to   1  to   2. 

The  results  of  the  subjective  fatigue  inquiry  are  shown  in 
Appendix  L.   Numerical  scores  shown  were  obtained  by  multiplying 
the  number  of  items  scored  "better  than"  by  two  and  adding  the 
number  of  items  scored  "same  as",  as  recommended  by  Pearson  and 
Byars  (1956) .   A  non-parametric  Friedman  two-way  analysis  of 
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TABLE  III 
MEAN  SUBJECT  VERBAL  ERROR  RATES* 

BY  OPERATOR  MENTAL  LOADING  -  EXPERIMENTAL  CONDITION 


NRT 

.42% 

RDO 

.63% 

RD1 

1.04% 

RD2 

1.38% 

BY  TRIAL 

HALF 

First  half       .90% 
Second  half      .83% 

BY  SUBJECT  T6  00  EXPERIENCE  LEVEL 
"Little  experience"      .98% 
"No  experience"  .63% 

BY  VOCABULARY  WORD  TYPE 

Rhyming  .81 

Non-rhyming  but  similar     1.15% 
Operational  .70% 

OVERALL       .86% 


* 


Expressed  in  subject  verbal  errors  per  100  vocabulary  words 
presented  to  S  via  the  headset.   A  subject  verbal  error  was 
defined  in  this  research  to  be  a  failure  of  the  subject 
to  repeat  correctly  the  presented  vocabulary  word.   This 
failure  could  be  either  a  failure  to  respond  (omission)  or 
responding  with  a  non-vocabulary  word  or  the  wrong  vocabulary 
word  (commission) . 
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variance  was  performed  on  this  data  and  it  was  concluded  that 

subjective  fatigue  was  the  same  for  the  four  operator  mental 

.  .        2 

loading  conditions   (x   =  3.09,  df  =  3 ,  p  >  .3,  a  =  .10). 

The  unexpected  difference  between  the  mean  subject  verbal 
error  rates  for  "Little  experience"  and  "No  experience"  subjects, 
shown  in  Table  III,  prompted  the  author  to  test  whether  or  not 
this  difference  was  significant.   A  test  based  on  the  Poisson 
distribution  was  performed  and  concluded  that  the  difference 
was  significant  (p  <  .10,  two-tailed  test,  a  =  .10). 

3 .   General  Results 

The  following  were  investigated  graphically: 

a.  T600  recognition  error  rate  versus  subject  verbal 
error  rate;  and, 

b.  RATER  scores  versus  subject  verbal  error  rates. 

No  relationships  were  apparent.  Spearman  rank  correlation  coef- 
ficients between  subject  RATER  scores  and  T600  recognition  error 
rates  were  calculated  for  each  delay  mode;  none  were  found  to 

be  significant  (r  (RDO)  =  -.110;  r  (RD1)  =  .127;  r  (RD2)  =  -.214; 
^  s  s  s 

r  (critical)  =  +.343,  two  tailed  test,  a  =  .10). 

I.   DISCUSSION 

Operator  mental  loading  had  a  significant  differential  effect 
on  subject  verbal  error  rate,  as  expected,  but  trial  half  did  not. 
"Little  experience"  subjects  had  a  higher  subject  verbal  error 
rate  than  "no  experience"  subjects;  why  this  occurred  is  not  known 
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The  subjective  fatigue  checklist  used  did  not  disclose  sig- 
nificant differences  between  any  of  the  four  operator  mental 
loading  -  experimental  conditions.   This  was  probably  partly 
because  the  effects  of  order  of  presentation  of  the  conditions 
dominated  any  possible  condition  effects  during  subjects  scoring 
of  the  checklists.   (Several  subjects  advised  the  experimenter 
after  a  RATER  condition  that  the  condition  was  more  fatiguing 
than  condition  NRT  but  they  had  to  score  the  RATER  condition 
higher  because  it  was  the  last,  or  next  to  last,  condition  and 
the  subject  felt  good  because  the  end  was  at  hand.) 

The  following  hypotheses  were  confirmed. 

1.  Operator  mental  loading  affected  the  performance  of 
the  voice  recognition  system  in  that  T600  recognition 
error  rates  in  the  three  conditions  involving  con- 
current RATER  tasking  were  23%  greater  than  the  error 
rate  of  the  no  RATER  task  condition. 

2.  Performance  of  the  voice  recognition  system  during  the 
first  2.5  minutes  of  a  trial  differed  from  that  during 
the  second  2.5  minutes.   A  future  experiment  will 
investiate  this  possible  degradation  over  time. 

3.  T600  recognition  error  rates  were  not  statistically 
different  for  "no  experience"  and  "little  experience" 
(with  respect  to  the  T600)  subjects.   This  may  simply 

be  due  to  the  limited  experience  of  even  the  most 
experienced  subject  who  had  only  12  hours  previous 
experience. 
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It  must  be  emphasized  that  the  recognition  error  rates 
obtained  with  the  T600  in  this  experiment  are  at  least  ten  times 
what  has  commonly  been  found .   These  higher  recognition  error 
rates  were  deliberately  sought  by  the  experimenters  (as  dis- 
cussed earlier)  and  are  primarily  due  to  the  vocabulary  selected. 
The  average  error  rate  on  the  20  operational  vocabulary  words 
was  2.91%;  the  average  error  rate  on  the  30  words  taken  from 
the  Modified  Rhyme  Test  was  19.18%.   A  non-parametric  Friedman 
two-way  analysis  of  variance  was  performed  and  concluded  that 

recognition  error  rate  differed  by  vocabulary  word  type  (rhyme, 

2 
non-rhyme  but  similar,  and  operational)  (x   =  45.06,  df  =  2, 

p  <  .0005).   A  non-parametric  test  proposed  by  Nemenyi  (in 
Kirk,  1968,  p.  497)  was  performed  to  determine  which  pairwise 
comparisons  of  recognition  error  rates  were  significant;  it 
was  concluded  that  all  pairwise  differences  were  significant 
(p  <  .01) . 

After  the  a  priori  hypotheses  had  been  tested  it  was  sug- 
gested that  the  T600  recognition  error  hypotheses  be  retested 
using  only  operational  vocabulary  word  data.   This  was  done  using 
tests  based  on  the  Poisson  distribution.   The  analysis  showed 
the  operator  mental  loading  condition  effect  to  be  significant 
(p  <  .10),  as  it  was  when  using  the  whole  vocabulary.   The  trial 
half  effect  was  found  to  be  not  significant  (p  >  .2) .   It  is 
not  known  whether  this  result  indicates  that  the  trial  half 
difference  observed  when  using  the  whole  vocabulary  was  not 
present  with  the  operational  words  or  whether  it  indicates  that 
the  test  using  just  the  operational  words  was  not  powerful  enough 
to  detect  the  difference.   This  uncertainty  will  be  investigated 
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in  a  future  experiment.   The  analysis  also  showed  that  "no 
experience"  subjects  generated  higher  recognition  error  rates 
than  "little  experience"  subjects  (p  <  .10,  two-tailed  test, 
a  =  .10) . 

This  may  be  due  to  the  fact  that  the  "little  experience" 
subjects  had  more  experience  inputting  the  operational  words 
of  the  vocabulary  than  the  "no  experience"  subjects.   Most  of 
the  operational  words  used  were  also  used  in  the  experiment 
by  Poock  (1980)  in  which  all  of  the  "little  experience"  subjects 
participated;   none  of  the  "no  experience"  subjects  participated 
in  that  experiment. 
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APPENDIX  A 
JLARY  LISTING  (BY  WORD  TYPE) 


RHYMING 

^a  le 

tale 

g_old 

cold 

£ame 

came 

bark 

£ark 

tip 

dip 

big 

Rig 

beat 

£eat 

ten 

den 

NON- RHYMING 

BUT  SIMILAR 

sa£ 

sat 

pea£ 

peace 

race 

raze 

save 

safe 

lake 

late 

kit 

kid 

mad 

mat 

OPERATIONAL 

list 

course 

attack 

time 

plot 

bingo 

speed 

air 

report 

dive 

fire 

distance 

drop 

launch 

copy 

refuel 

cancel 

proceed 

label 

station 


A  vocabulary  listing  in  the  order  in  which  the  words  were 
trained  is  attached  to  the  written  instructions  initially 
given  to  subjects  and  shown  in  Appendix  C. 
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APPENDIX  B 
SUBJECT  DATA  SHEET 

Subject  number:  Name: Age: 

Time/date:  Service: 


Rank:  MOS  (in  words) : 


Do  you  object  to  being  taperecorded  during  the  experiment?  If 
you  do,  stop  filling  out  this  form  and  advise  the  experimenter 
now;  otherwise,  continue. 

How  many  hours  experience  have  you  had  on  voice  recognition 
equipment  in  the  last  six  months? 

hours  (approximately) 


How  many  hours  experience  have  you  had  on  reaction  measurement 
devices  in  the  past  year? 

hours  (approximately) 

Do  you  have  a  speech  or  hearing  impediment?   Yes      No 

(circle  one) 

Do  you  want  a  post  participation  briefing  on  your  performance 
and  on  the  hypotheses  being  tested  by  the  experimenter?  Note 
that  if  you  request  such  a  briefing,  you  must  agree  not  to 
discuss  this  with  anyone  other  than  the  experimenter  so  that 
no  subject  will  learn  what  results  are  expected  prior  to  his 
participation  in  the  experiment;  such  prior  knowledge  could 
invalidate  the  results  of  the  experiment. 

Yes      No 
(circle  one) 

After  you  have  completed  participation  in  the  experiment  you 
will  be  asked  to  write  below  any  comments  which  you  think  may 
be  useful  to  the  experimenter.   If  you  have  any  questions  now, 
please  ask  the  experimenter.   Otherwise,  give  him  this  form 
now  and  start  reading  the  pages  titled  "INTRODUCTORY  REMARKS/ 
RECOGNIZER  VOCABULARY  TRAINING" . 


POST  EXPERIMENT  COMMENTS 

(continue  on  reverse  side  if  this  space  is  insufficient) 
THANK  YOU  FOR  YOUR  PARTICIPATION 


APPENDIX  C 
WRITTEN  INSTRUCTIONS 
INTRODUCTORY  REMARKS  /  RECOGNIZER  VOCABULARY  TRAINING 
INTRODUCTORY  REMARKS 

This  experiment  involves  analysis  of  a  combined  human 
operator  /  voice  recognition  equipment  system  under  various 
conditions  of  operator  mental  loading.   The  actual  experiment 
will  be  carried  out  in  a  sound-proof  booth  and  subject  - 
experimenter  communication  during  the  actual  experiment  will 
be  via  the  booth  intercom  system;  however,  you  may  remove  the 
headset  assembly  during  break  periods  and  leave  the  booth. 
CAUTION;   The  mounting  of  the  voice  recognizer  micro- 
phone on  the  headset  assembly  is  very  delicate,  easily 
damaged,  and  difficult  to  repair.   Please  be  careful 
while  handling  this  assembly. 

Please  carry  out  the  experiment  exactly  as  directed  and 
do  not  discuss  your  performance  with  anyone  other  than  the 
experimenter  as  inappropriate  subject  prior  knowledge  could 
invalidate  the  results. 

VOICE  RECOGNIZER  VOCABULARY  TRAINING 

The  50  word  vocabulary  being  used  with  the  voice  recog- 
nizer in  this  experiment  is  attached  to  these  instructions. 
You  will  be  required  to  repeat  each  word  of  this  vocabulary 
ten  times  to  train  the  recognizer  to  recognize  your  particular 
vocalizations  of  each  word.   To  facilitate  recognition  by 
the  voice  recognizer,  you  should  include  in  the  ten  repetitions 
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as  many  as  possible  of  the  different  ways  you  might  say  the 
word  in  normal  speech;  for  example,  use  different  intonations 
and  emphasis,  and  small  variations  in  volume. 

In  order  to  keep  track  of  the  number  of  times  you  say  each 
word,  and  to  reduce  breath  noise,  it  is  best  to  speak  the  10 
repetitions  in  several  groups.   For  example,  if  the  word  is 
zero,  it  is  better  to  group  them  as: 

000-000-0000 

or  000-000-000-0 

rather  than  as       0000000000 

or  0-0-0-0-0-0-0-0-0-0 

Please  observe  the  following  guidelines  while  inputting 
voice  data  to  the  recognizer  both  during  training  and  later 
during  the  actual  experiment. 

a.  Speak  each  word  crisply  and  quickly  but  do  not  over- 
pronounce;  for  example,  words  ending  in  "t"  -  delete 
final  "t"  if  more  natural. 

b.  Be  sure  to  leave  a  distinct  pause  (specifically,  at 
least  one-tenth  of  a  second  of  silence)  between  each 
word  so  that  the  recognizer  can  distinguish  the  end 
of  one  word  from  the  beginning  of  the  next.   Sim- 
larly,  do  not  leave  a  period  of  silence  within  a  word 
or  the  recognizer  will  mistake  it  for  two  separate 
words . 

c.  Avoid  breathing  into  the  microphone  at  the  end  of 
words  as  this  will  generate  false  inputs  to  the 
recognizer. 
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d.   Microphone  location  is  very  important  and  should  be 
kept  constant  throughout  the  experiment;  i.e.,  adjust 
it  if  it  gets  out  of  place.   The  experimenter  will 
initially  demonstrate  correct  microphone  placement. 
From  this  point  on  instructions  will  be  given  to  you 
verbally  by  the  experimenter.   Please  advise  him  if  you  have 
any  questions  now. 
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VOCABULARY  LISTING  (IN  TRAINING  ORDER) 


0. 

attack 

1. 

list 

2. 

g_ale 

3. 

tale 

4. 

bingo 

5. 

sa£ 

6. 

sat 

7. 

time 

8. 

£Old 

9. 

cold 

10. 

cancel 

11. 

peas 

12. 

peace 

13. 

speed 

14. 

g_ame 

15. 

came 

16. 

distance 

17. 

race 

18. 

raze 

19. 

copy 

20. 

bark 

• 

21. 

£ark 

22. 

launch 

23. 

save 

24. 

safe 

25. 

refuel 

26. 

tip 

27. 

dip 

28. 

drop 

29. 

lake 

30. 

late 

31. 

course 

32. 

big 

33. 

p_ig 

34. 

report 

35. 

kit 

36. 

kid 

37. 

plot 

38. 

beat 

39. 

£eat 

40. 

proceed 

41. 

mad 

42. 

mat 

43. 

fire 

44. 

ten 

45. 

den 

46. 

label 

47. 

air 

48. 

station 

49. 

dive 
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Subject  number 


APPENDIX  D 
SUBJECTIVE  FATIGUE  CHECKLIST 
Experimental  condition 


FEELING  TONE  CHECK  LIST 


No. 


Better 
than 


Same 
as 


Worse 
than 


Statement 


1.  (   )     (   )    (   )     slightly  tired 

2.  (   )     (   )    (   )     like  I'm  bursting  with  energy 

3.  (   )     (   )    (   )     extremely  tired 

4 .  (   )     (   )    (   )     quite  fresh 

5.  (   )     (   )    (   )     slightly  pooped 

6.  (   )     (   )    (   )     extremely  peppy 

7.  (   )     (   )    (   )     somewhat  fresh 

8 .  (   )     (   )    (   )     petered  out 

9.     (   )     (   )    (   )     very  refreshed 

10.  (   )     (   )    (   )    ready  to  drop 

11.  (   )     (   )    (   )     fairly  well  pooped 

12.  (   )     (   )    (   )    very  lively 

13.  (   )     (   )    (   )    very  tired 

Have  you  checked  each  statement? 
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INSTRUCTIONS  FOR  COMPLETING  FEELING  TONE  CHECKLIST 


People  feel  different  at  various  times  for  various  reasons. 
Some  arise  after  a  night's  rest  feeling  "quite  rested"  while 
others  may  feel  "a  little  tired".   A  hard  day's  work  or  a 
vigorous  workout  at  the  gym  may  make  you  feel  "fairly  well 
pooped";  yet,  a  shower,  a  cup  of  coffee,  or  merely  a  few 
minutes  relaxing  in  a  comfortable  chair  may  make  you  feel 
"very  refreshed" . 

I  would  like  to  find  out  how  you  feel  right  now.   On  the 
accompanying  sheet,  you  will  see  13  statements  which  describe 
different  degrees  of  freshness  or  peppiness  and  tiredness.   For 
each  statement  you  will  have  to  determine  in  your  own  mind 
whether  you  feel  at  this  instant  (1)  "Better  than",  (2)  the 
"Same  as",  or  (3)~irWorse  than"  the  feeling  described  by  that 
statement.   Having  done  this  you  will  then  place  an  "X"  in  the 
appropriate  box. 

Consider  the  following  example: 


Better   Same   Worse 
No. than as than Statement 

0.         (   )     (   )    (   )  somewhat  tired 


If  right  now  you  felt  "somewhat  tired"  you  would  place  an 
"X"  in  the  box  marked  "Same  as".   If,  however,  you  felt  fresh 
or  full  of  pep  you  would  check  the  box  marked  "Better  than" 
because  you  would  be  feeling  better  than  "somewhat  tired". 
On  the  other  hand,  if  you  felt  exhausted  you  would  place  an 
"X"  in  the  box  marked  "Worse  than". 

Take  each  statement  in  order;  do  not  skip  around  from  one 
to  another.   Read  each  statement  carefully  so  that  you  under- 
stand what  it  means.   It  may  help  you  to  understand  some  state- 
ments if  you  mentally  insert  the  words  "I  feel"  or  "I  am" 
before  the  statement. 

This  is  not  a  test.   You  have  all  the  time  you  need. 
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APPENDIX   J 

SUBJECT  VERBAL  ERRORS* 

An  entry  w/x  (y/z),  indicates  that  a  total  of  w  subject  verbal  errors, 
of  which  y  were  errors  of  not  speaking  any  word  or  speaking  a  non- 
vocabulary  word  (when  prompted  with  a  vocabulary  word) ,  occured  in 
the  first  half  of  the  trial  and  a  total  of  x  subject  verbal  errors, 
of  which  z  were  errors  of  not  speaking  any  word  or  speaking  a  non- 
vocabulary  word  (when  prompted  with  a  vocabulary  word) ,  occured  in 
the  second  half  of  the  trial. 


OPERATOR  MENTAL  LOADING  -  EXPERIMENTAL  CONDITION 


SUBJECT 

NUMBER 

NRT 

RD0 

RDl 

RD2 

1 

0/0 

(0/0) 

0/0 

(0/0) 

I/O 

(0/0) 

4/2 

(4/2) 

2 

0/0 

(0/0) 

0/0 

(0/0) 

0/0 

(0/0) 

1/1 

(1/D 

3 

0/0 

(0/0) 

I/O 

(1/0) 

1/2 

(1/0) 

1/1 

(1/D 

4 

o/i 

(0/0) 

0/0 

(0/0) 

3/3 

(2/2) 

0/2 

(0/0) 

5 

1/1 

(0/0) 

0/0 

(0/0) 

0/0 

(0/0) 

o/i 

(0/0) 

6 

I/O 

(1/0) 

0/0 

(0/0) 

0/0 

(0/0) 

0/0 

(0/0) 

7 

0/0 

(0/0) 

0/0 

(0/0) 

4/2 

(2/2) 

0/1 

(0/1) 

8 

0/0 

(0/0) 

0/0 

(0/0) 

o/i 

(0/1) 

1/1 

'0/1 ) 

9 

0/0 

(0/0) 

o/i 

(0/1) 

o/i 

(0/1) 

0/1 

'0/0 ) 

10 

0/0 

(0/0) 

0/0 

(0/0) 

0/0 

[0/0) 

0/1 

'0/1 ) 

11 

0/0 

(0/0) 

0/0 

(0/0) 

0/0 

(0/0) 

1/0 

rl/0) 

12 

0/1 

(0/1) 

0/0 

(0/0) 

0/0 

:o/o) 

0/0 

0/0) 

13 

0/0 

(0/0) 

0/0 

(0/0) 

0/0 

1 0/0) 

2/1 

2/0) 

14 

0/0 

(0/0) 

o/i 

(0/1) 

0/0 

'0/0 ) 

1/0 

0/0) 

15 

1/1 

(1/1) 

2/2 

(1/1) 

1/1 

1/1) 

2/1       1 

1/1) 

16 

0/0 

(0/0) 

2/1 

(1/0) 

0/0      1 

0/0) 

0/0       | 

0/0) 

17 

0/0 

(0/0) 

1/1 

(0/0) 

0/0 

0/0) 

0/0       ( 

0/0) 

18 

0/2 

(0/0) 

2/0 

(2/0) 

1/1 

1/0) 

2/0       ( 

2/0) 

19 

o/i 

(0/0) 

0/0 

(0/0) 

1/0       ( 

1/0) 

0/0       ( 

0/0) 

20 

0/0 

(0/0) 

0/0 

(0/0) 

0/1       ( 

0/1) 

0/0       ( 

0/0) 

21 

0/0 

(0/0) 

0/0 

(0/0) 

0/0       ( 

0/0) 

0/1       ( 

0/1) 

22 

0/0 

(0/0) 

0/0 

(0/0) 

1/0       ( 

1/0) 

1/0       ( 

1/0) 

23 

0/0 

(0/0) 

I/O 

(0/0) 

0/0       ( 

0/0) 

1/1    ( 

1/0) 

24 

0/0 

(0/0) 

0/0 

(0/0) 

0/0      ( 

0/0) 

1/0       ( 

1/0) 

*  A  subject  verbal  error  was  defined  in  this  research  to  be  a  failure 
of  a  subject  to  repeat  correctly  the  presented  vocabulary  word. 
This  failure  could  be  either  a  failure  to  respond  (omission)  or 
respondinq  with  a  non-vocabulary  word  or  the  wrong  vocabulary 
word  (commission). 

Subjects  1  to  16  inclusive  had  "little  experience"  on  the  T600  and 
subjects  17  to  24  inclusive  had  "no  experience". 
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APPENDIX  K 
RATER  SCORES 


SUBJECT 
NUMBER 

1 

2 

3 

4 

5 

6 

7 

6 

9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 


OPERATOR  MENTAL  LOADING  -  EXPERIMENTAL  COND IT ION 


RDO 

197 

191 

198 

195 

199 

186 

196 

193 

188 

194 

181 

188 

187 

185 

192 

179 

170 

188 

150 

186 

198 

195 

194 

182 


RDl 

188 

174 

171 

131 

182 

181 

156 

171 

172 

199 

188 

187 

174 

172 

184 

166 

128 

191 

139 

179 

182 

55 

163 

170 


RD2 

90 

99 

85 

102 

98 

96 

98 

64 

112 

144 

104 
71 
81 

154 

-23 
7  3 
94 
56 
91 

155 
31 

110 
87 


Subjects  1  to  16  inclusive  had  "lit  lie  experience"  on  tne  Tt>00  ana 
subjects  17  to  24  inclusive  had  "no  experience" . 

Subjects  22  and  24  each  had  approximately  one  half  hour  prior  experience 
on  the  RATER;   no  other  subjects  had  prior  experience  on  the  RATER. 

To  avoid  unnecessarily  complex  instructions,  subjects  were  told  that 
their  RATER  scores  would  be  simply   number  of  correct  responses  minus 
number  of  incorrect  responses,  which  included  both  omission  and  commission 
errors.   This  made  the  RATER  tasks  more  demanding  since  it  discouraged 
both  guessing  and  failing  to  respond.   However,  it  is  not  possible  to 
determine  the  exact  number  of  errors  made  from  the  RATER  counters;   it 
is  only  possible  to  calculate  a  lower  bound  on  the  number  of  errors.   For 
this  reason,  the  RATER  scores  actually  assigned  were  calculated  with  the 
following  commonly  used  formula:   score  =  two  times  number  of  correct 
responses  minus  total  number  of  responses.   A  perfect  score  for  any 
experimental  condition  was  200. 
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APPENDIX  ly 
SUBJECTIVE  FATIGUE  SCORES* 


OPERATOR  MENTAL  LOADING  -  EXPERIMENTAL  CONDITION 


SUBJECT 
NUMBER 

1 

2 

3 

4 

5 

6 

7 

8 

9 
10 
11 
12 
'  13 
14 
15 
16 
"  17 
18 
19 
20 
21 
22 
23 
24 


NRT 


RDO 


RDl 


RD2 


18 

18 

18 

18 

17 

20 

17 

16 

13 

13 

13 

13 

14 

19 

13 

12 

12 

12 

14 

13 

15 

16 

14 

14 

13 

13 

10 

13 

18 

13 

10 

16 

10 

13 

12 

12 

16 

13 

16 

11 

12 

11 

12 

11 

21 

21 

21 

21 

16 

16 

19 

16 

15 

17 

13 

17 

11 

17 

12 

18 

14 

12 

9 

12 

16  . 

16 

16 

12 

16 

16 

16 

16 

13 

12 

7 

12 

12 

12  ' 

12 

12 

16 

13 

11 

12 

11 

11 

11 

11 

14 

15 

14 

18 

12 

12 

12 

9 

*  Higher  scores  are  associated  with  lower  subjective  fatigue  and 
vice  versa. 

Scores  were  obtained  by  multiplying  the  number  of  items  scored  as 
"better  than"  by  two  and  adding  the  number  of  items  scored  as  "same  as" 
as  recommended  by  those  who  developed  the  checklist  (Pearson  and  Byars, 
1956) . 

Subjects  1  to  16  inclusive  had  "little  experience"  on  the  T600  and 
subjects  17  to  24  inclusive  had  "no  experience". 
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